Search CORE

691 research outputs found

Creating sustainable industrial building in China by lowering energy use and improving the working environment

Author: Hu Wenjie
Publication venue: RIT Scholar Works
Publication date: 01/12/2018
Field of study

Over 70% of energy in China is consumed by industrial buildings and related support activities. In order to reduce production costs, owners of factories tend to pay little attention to protecting the environment or workers’ health, causing many serious consequences. Poorly designed buildings require more energy to operate and are less energy efficient to maintain. A lack of natural lighting results in more energy consumption for interior illumination, and improper fenestration design requires more energy to keep buildings cool. An uncomfortable working environment will affect the working efficiency and physical health of the labor force. In this thesis, two simple, passive strategies—daylighting and passive cooling for factory buildings—are studied and tested separately, aiming to reduce energy consumption and improve the working environment. The study location selected is in the Shanghai suburban area. The object of the study is a typical, old manufacturing plant that is still in use, which represents the majority of industrial buildings in China. Instead of rebuilding the entire industrial building construction, renovating existing industrial buildings, by applying a passive strategy, has huge potential to save energy and improve the working environment. Whether and how much these passive strategies can reduce energy consumption of existing industrial buildings and improve the working environment will be explored by comparing existing data analysis to a renovated building data analysis

RIT Scholar Works

Representation Learning for Scale-free Networks

Author: Feng Rui
Hu Wenjie
Wu Fei
Yang Yang
Zhuang Yueting
Publication venue
Publication date: 29/11/2017
Field of study

Network embedding aims to learn the low-dimensional representations of vertexes in a network, while structure and inherent properties of the network is preserved. Existing network embedding works primarily focus on preserving the microscopic structure, such as the first- and second-order proximity of vertexes, while the macroscopic scale-free property is largely ignored. Scale-free property depicts the fact that vertex degrees follow a heavy-tailed distribution (i.e., only a few vertexes have high degrees) and is a critical property of real-world networks, such as social networks. In this paper, we study the problem of learning representations for scale-free networks. We first theoretically analyze the difficulty of embedding and reconstructing a scale-free network in the Euclidean space, by converting our problem to the sphere packing problem. Then, we propose the "degree penalty" principle for designing scale-free property preserving network embedding algorithm: punishing the proximity between high-degree vertexes. We introduce two implementations of our principle by utilizing the spectral techniques and a skip-gram model respectively. Extensive experiments on six datasets show that our algorithms are able to not only reconstruct heavy-tailed distributed degree distribution, but also outperform state-of-the-art embedding models in various network mining tasks, such as vertex classification and link prediction.Comment: 8 figures; accepted by AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Capturing Evolution Genes for Time Series Data

Author: Hu Wenjie
Liu Zongtao
Sun Zhanlin
Wu Liang
Yang Yang
Yao Bingshen
Publication venue
Publication date: 10/05/2019
Field of study

The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments' distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.Comment: a preprint version. arXiv admin note: text overlap with arXiv:1703.10155 by other author

arXiv.org e-Print Archive

Invariant manifolds for stochastic delayed partial differential equations of parabolic type

Author: Caraballo Garrido Tomás
Hu Wenjie
Publication venue: Elsevier
Publication date: 12/06/2023
Field of study

The aim of this paper is to prove the existence and smoothness of stable and unstable invariant manifolds for a stochastic delayed partial differential equation of parabolic type. The stochastic delayed partial differential equation is firstly transformed into a random delayed partial differential equation by a conjugation, which is then recast into a Hilbert space. For the auxiliary equation, the variation of constants formula holds and we show the existence of Lipschitz continuous stable and unstable manifolds by the Lyapunov-Perron method. Subsequently, we prove the smoothness of these invariant manifolds under appropriate spectral gap condition by carefully investigating the smoothness of auxiliary equation, after which, we obtain the invariant manifolds of the original equation by projection and inverse transformation. Eventually, we illustrate the obtained theoretical results by their application to a stochastic single-species population model

idUS. Depósito de Investigación Universidad de Sevilla

The semiannual cycle of sea surface and free air temperatures

Author: Hu Wenjie
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Earth, Atmospheric, and Planetary Sciences, 1995.Includes bibliographical references (leaves 55-57).by Wenjie Hu.M.S

DSpace@MIT

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Author: Balasubramanian Vivek
Cervone Guido
Hu Weiming
Jha Shantenu
Lefebvre Matthieu
Lei Wenjie
Tromp Jeroen
Turilli Matteo
Publication venue
Publication date: 16/05/2018
Field of study

Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv) fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing

arXiv.org e-Print Archive

Crossref

Coresets for Wasserstein Distributionally Robust Optimization Problems

Author: Ding Hu
Huang Jiawei
Huang Ruomin
Liu Wenjie
Publication venue
Publication date: 09/10/2022
Field of study

Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular model to enhance the robustness of machine learning with ambiguous data. However, the complexity of \textsf{WDRO} can be prohibitive in practice since solving its ``minimax'' formulation requires a great amount of computation. Recently, several fast \textsf{WDRO} training algorithms for some specific machine learning tasks (e.g., logistic regression) have been developed. However, the research on designing efficient algorithms for general large-scale \textsf{WDRO}s is still quite limited, to the best of our knowledge. \textit{Coreset} is an important tool for compressing large dataset, and thus it has been widely applied to reduce the computational complexities for many optimization problems. In this paper, we introduce a unified framework to construct the

\epsilon

-coreset for the general \textsf{WDRO} problems. Though it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the uncertainty issue of ambiguous data, we show that we can compute a ``dual coreset'' by using the strong duality property of \textsf{WDRO}. Also, the error introduced by the dual coreset can be theoretically guaranteed for the original \textsf{WDRO} objective. To construct the dual coreset, we propose a novel grid sampling approach that is particularly suitable for the dual formulation of \textsf{WDRO}. Finally, we implement our coreset approach and illustrate its effectiveness for several \textsf{WDRO} problems in the experiments

arXiv.org e-Print Archive

Paradoxes and resolutions for semiparametric fusion of individual and summary data

Author: Hu Wenjie
Li Wei
Miao Wang
Wang Ruoyu
Publication venue
Publication date: 05/04/2023
Field of study

Suppose we have available individual data from an internal study and various types of summary statistics from relevant external studies. External summary statistics have been used as constraints on the internal data distribution, which promised to improve the statistical inference in the internal data; however, the additional use of external summary data may lead to paradoxical results: efficiency loss may occur if the uncertainty of summary statistics is not negligible and large estimation bias can emerge even if the bias of external summary statistics is small. We investigate these paradoxical results in a semiparametric framework. We establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is shown to be no larger than that using only internal data. We propose a data-fused efficient estimator that achieves this bound so that the efficiency paradox is resolved. Besides, a debiased estimator is further proposed which has selection consistency property by employing adaptive lasso penalty so that the resultant estimator can achieve the same asymptotic distribution as the oracle one that uses only unbiased summary statistics, which resolves the bias paradox. Simulations and application to a Helicobacter pylori infection dataset are used to illustrate the proposed methods.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

Adaptive Model Predictive Control for Engine-Driven Ducted Fan Lift Systems using an Associated Linear Parameter Varying Model

Author: Ho Hann Woei
Hu Wenjie
Jiang Hanjie
Zhou Ye
Publication venue
Publication date: 21/09/2023
Field of study

Ducted fan lift systems (DFLSs) powered by two-stroke aviation piston engines present a challenging control problem due to their complex multivariable dynamics. Current controllers for these systems typically rely on proportional-integral algorithms combined with data tables, which rely on accurate models and are not adaptive to handle time-varying dynamics or system uncertainties. This paper proposes a novel adaptive model predictive control (AMPC) strategy with an associated linear parameter varying (LPV) model for controlling the engine-driven DFLS. This LPV model is derived from a global network model, which is trained off-line with data obtained from a general mean value engine model for two-stroke aviation engines. Different network models, including multi-layer perceptron, Elman, and radial basis function (RBF), are evaluated and compared in this study. The results demonstrate that the RBF model exhibits higher prediction accuracy and robustness in the DFLS application. Based on the trained RBF model, the proposed AMPC approach constructs an associated network that directly outputs the LPV model parameters as an adaptive, robust, and efficient prediction model. The efficiency of the proposed approach is demonstrated through numerical simulations of a vertical take-off thrust preparation process for the DFLS. The simulation results indicate that the proposed AMPC method can effectively control the DFLS thrust with a relative error below 3.5%

arXiv.org e-Print Archive

Identifying effects of multiple treatments in the presence of unmeasured confounding

Author: Hu Wenjie
Miao Wang
Ogburn Elizabeth L.
Zhou Xiaohua
Publication venue
Publication date: 23/11/2020
Field of study

Identification of treatment effects in the presence of unmeasured confounding is a persistent problem in the social, biological, and medical sciences. The problem of unmeasured confounding in settings with multiple treatments is most common in statistical genetics and bioinformatics settings, where researchers have developed many successful statistical strategies without engaging deeply with the causal aspects of the problem. Recently there have been a number of attempts to bridge the gap between these statistical approaches and causal inference, but these attempts have either been shown to be flawed or have relied on fully parametric assumptions. In this paper, we propose two strategies for identifying and estimating causal effects of multiple treatments in the presence of unmeasured confounding. The auxiliary variables approach leverages auxiliary variables that are not causally associated with the outcome; in the case of a univariate confounder, our method only requires one auxiliary variable, unlike existing instrumental variable methods that would require as many instruments as there are treatments. An alternative null treatments approach relies on the assumption that at least half of the confounded treatments have no causal effect on the outcome, but does not require a priori knowledge of which treatments are null. Our identification strategies do not impose parametric assumptions on the outcome model and do not rest on estimation of the confounder. This work extends and generalizes existing work on unmeasured confounding with a single treatment, and provides a nonparametric extension of models commonly used in bioinformatics

arXiv.org e-Print Archive